58 research outputs found

    Hidden Markov Models and their Application for Predicting Failure Events

    Full text link
    We show how Markov mixed membership models (MMMM) can be used to predict the degradation of assets. We model the degradation path of individual assets, to predict overall failure rates. Instead of a separate distribution for each hidden state, we use hierarchical mixtures of distributions in the exponential family. In our approach the observation distribution of the states is a finite mixture distribution of a small set of (simpler) distributions shared across all states. Using tied-mixture observation distributions offers several advantages. The mixtures act as a regularization for typically very sparse problems, and they reduce the computational effort for the learning algorithm since there are fewer distributions to be found. Using shared mixtures enables sharing of statistical strength between the Markov states and thus transfer learning. We determine for individual assets the trade-off between the risk of failure and extended operating hours by combining a MMMM with a partially observable Markov decision process (POMDP) to dynamically optimize the policy for when and how to maintain the asset.Comment: Will be published in the proceedings of ICCS 2020; @Booklet{EasyChair:3183, author = {Paul Hofmann and Zaid Tashman}, title = {Hidden Markov Models and their Application for Predicting Failure Events}, howpublished = {EasyChair Preprint no. 3183}, year = {EasyChair, 2020}

    Crude incidence in two-phase designs in the presence of competing risks.

    Get PDF
    BackgroundIn many studies, some information might not be available for the whole cohort, some covariates, or even the outcome, might be ascertained in selected subsamples. These studies are part of a broad category termed two-phase studies. Common examples include the nested case-control and the case-cohort designs. For two-phase studies, appropriate weighted survival estimates have been derived; however, no estimator of cumulative incidence accounting for competing events has been proposed. This is relevant in the presence of multiple types of events, where estimation of event type specific quantities are needed for evaluating outcome.MethodsWe develop a non parametric estimator of the cumulative incidence function of events accounting for possible competing events. It handles a general sampling design by weights derived from the sampling probabilities. The variance is derived from the influence function of the subdistribution hazard.ResultsThe proposed method shows good performance in simulations. It is applied to estimate the crude incidence of relapse in childhood acute lymphoblastic leukemia in groups defined by a genotype not available for everyone in a cohort of nearly 2000 patients, where death due to toxicity acted as a competing event. In a second example the aim was to estimate engagement in care of a cohort of HIV patients in resource limited setting, where for some patients the outcome itself was missing due to lost to follow-up. A sampling based approach was used to identify outcome in a subsample of lost patients and to obtain a valid estimate of connection to care.ConclusionsA valid estimator for cumulative incidence of events accounting for competing risks under a general sampling design from an infinite target population is derived

    Accounting for Population Stratification in Practice: A Comparison of the Main Strategies Dedicated to Genome-Wide Association Studies

    Get PDF
    Genome-Wide Association Studies are powerful tools to detect genetic variants associated with diseases. Their results have, however, been questioned, in part because of the bias induced by population stratification. This is a consequence of systematic differences in allele frequencies due to the difference in sample ancestries that can lead to both false positive or false negative findings. Many strategies are available to account for stratification but their performances differ, for instance according to the type of population structure, the disease susceptibility locus minor allele frequency, the degree of sampling imbalanced, or the sample size. We focus on the type of population structure and propose a comparison of the most commonly used methods to deal with stratification that are the Genomic Control, Principal Component based methods such as implemented in Eigenstrat, adjusted Regressions and Meta-Analyses strategies. Our assessment of the methods is based on a large simulation study, involving several scenarios corresponding to many types of population structures. We focused on both false positive rate and power to determine which methods perform the best. Our analysis showed that if there is no population structure, none of the tests led to a bias nor decreased the power except for the Meta-Analyses. When the population is stratified, adjusted Logistic Regressions and Eigenstrat are the best solutions to account for stratification even though only the Logistic Regressions are able to constantly maintain correct false positive rates. This study provides more details about these methods. Their advantages and limitations in different stratification scenarios are highlighted in order to propose practical guidelines to account for population stratification in Genome-Wide Association Studies

    Joint Analysis for Genome-Wide Association Studies in Family-Based Designs

    Get PDF
    In family-based data, association information can be partitioned into the between-family information and the within-family information. Based on this observation, Steen et al. (Nature Genetics. 2005, 683–691) proposed an interesting two-stage test for genome-wide association (GWA) studies under family-based designs which performs genomic screening and replication using the same data set. In the first stage, a screening test based on the between-family information is used to select markers. In the second stage, an association test based on the within-family information is used to test association at the selected markers. However, we learn from the results of case-control studies (Skol et al. Nature Genetics. 2006, 209–213) that this two-stage approach may be not optimal. In this article, we propose a novel two-stage joint analysis for GWA studies under family-based designs. For this joint analysis, we first propose a new screening test that is based on the between-family information and is robust to population stratification. This new screening test is used in the first stage to select markers. Then, a joint test that combines the between-family information and within-family information is used in the second stage to test association at the selected markers. By extensive simulation studies, we demonstrate that the joint analysis always results in increased power to detect genetic association and is robust to population stratification

    Assessment of BED HIV-1 Incidence Assay in Seroconverter Cohorts: Effect of Individuals with Long-Term Infection and Importance of Stable Incidence

    Get PDF
    BACKGROUND: Performance of the BED assay in estimating HIV-1 incidence has previously been evaluated by using longitudinal specimens from persons with incident HIV infections, but questions remain about its accuracy. We sought to assess its performance in three longitudinal cohorts from Thailand where HIV-1 CRF01_AE and subtype B' dominate the epidemic. DESIGN: BED testing was conducted in two longitudinal cohorts with only incident infections (a military conscript cohort and an injection drug user cohort) and in one longitudinal cohort (an HIV-1 vaccine efficacy trial cohort) that also included long-term infections. METHODS: Incidence estimates were generated conventionally (based on the number of annual serocoversions) and by using BED test results in the three cohorts. Adjusted incidence was calculated where appropriate. RESULTS: For each longitudinal cohort the BED incidence estimates and the conventional incidence estimates were similar when only newly infected persons were tested, whether infected with CRF01_AE or subtype B'. When the analysis included persons with long-term infections (to mimic a true cross-sectional cohort), BED incidence estimates were higher, although not significantly, than the conventional incidence estimates. After adjustment, the BED incidence estimates were closer to the conventional incidence estimates. When the conventional incidence varied over time, as in the early phase of the injection drug user cohort, the difference between the two estimates increased, but not significantly. CONCLUSIONS: Evaluation of the performance of incidence assays requires the inclusion of a substantial number of cohort-derived specimens from individuals with long-term HIV infection and, ideally, the use of cohorts in which incidence remained stable. Appropriate adjustments of the BED incidence estimates generate estimates similar to those generated conventionally

    Genetic risk factors for cerebrovascular disease in children with sickle cell disease: design of a case-control association study and genomewide screen

    Get PDF
    BACKGROUND: The phenotypic heterogeneity of sickle cell disease is likely the result of multiple genetic factors and their interaction with the sickle mutation. High transcranial doppler (TCD) velocities define a subgroup of children with sickle cell disease who are at increased risk for developing ischemic stroke. The genetic factors leading to the development of a high TCD velocity (i.e. cerebrovascular disease) and ultimately to stroke are not well characterized. METHODS: We have designed a case-control association study to elucidate the role of genetic polymorphisms as risk factors for cerebrovascular disease as measured by a high TCD velocity in children with sickle cell disease. The study will consist of two parts: a candidate gene study and a genomewide screen and will be performed in 230 cases and 400 controls. Cases will include 130 patients (TCD ≥ 200 cm/s) randomized in the Stroke Prevention Trial in Sickle Cell Anemia (STOP) study as well as 100 other patients found to have high TCD in STOP II screening. Four hundred sickle cell disease patients with a normal TCD velocity (TCD < 170 cm/s) will be controls. The candidate gene study will involve the analysis of 28 genetic polymorphisms in 20 candidate genes. The polymorphisms include mutations in coagulation factor genes (Factor V, Prothrombin, Fibrinogen, Factor VII, Factor XIII, PAI-1), platelet activation/function (GpIIb/IIIa, GpIb IX-V, GpIa/IIa), vascular reactivity (ACE), endothelial cell function (MTHFR, thrombomodulin, VCAM-1, E-Selectin, L-Selectin, P-Selectin, ICAM-1), inflammation (TNFα), lipid metabolism (Apo A1, Apo E), and cell adhesion (VCAM-1, E-Selectin, L-Selectin, P-Selectin, ICAM-1). We will perform a genomewide screen of validated single nucleotide polymorphisms (SNPs) in pooled DNA samples from 230 cases and 400 controls to study the possible association of additional polymorphisms with the high-risk phenotype. High-throughput SNP genotyping will be performed through MALDI-TOF technology using Sequenom's MassARRAY™ system. DISCUSSION: It is expected that this study will yield important information on genetic risk factors for the cerebrovascular disease phenotype in sickle cell disease by clarifying the role of candidate genes in the development of high TCD. The genomewide screen for a large number of SNPs may uncover the association of novel polymorphisms with cerebrovascular disease and stroke in sickle cell disease

    Population Substructure and Control Selection in Genome-Wide Association Studies

    Get PDF
    Determination of the relevance of both demanding classical epidemiologic criteria for control selection and robust handling of population stratification (PS) represents a major challenge in the design and analysis of genome-wide association studies (GWAS). Empirical data from two GWAS in European Americans of the Cancer Genetic Markers of Susceptibility (CGEMS) project were used to evaluate the impact of PS in studies with different control selection strategies. In each of the two original case-control studies nested in corresponding prospective cohorts, a minor confounding effect due to PS (inflation factor λ of 1.025 and 1.005) was observed. In contrast, when the control groups were exchanged to mimic a cost-effective but theoretically less desirable control selection strategy, the confounding effects were larger (λ of 1.090 and 1.062). A panel of 12,898 autosomal SNPs common to both the Illumina and Affymetrix commercial platforms and with low local background linkage disequilibrium (pair-wise r2<0.004) was selected to infer population substructure with principal component analysis. A novel permutation procedure was developed for the correction of PS that identified a smaller set of principal components and achieved a better control of type I error (to λ of 1.032 and 1.006, respectively) than currently used methods. The overlap between sets of SNPs in the bottom 5% of p-values based on the new test and the test without PS correction was about 80%, with the majority of discordant SNPs having both ranks close to the threshold. Thus, for the CGEMS GWAS of prostate and breast cancer conducted in European Americans, PS does not appear to be a major problem in well-designed studies. A study using suboptimal controls can have acceptable type I error when an effective strategy for the correction of PS is employed

    How to handle mortality when investigating length of hospital stay and time to clinical stability

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Hospital length of stay (LOS) and time for a patient to reach clinical stability (TCS) have increasingly become important outcomes when investigating ways in which to combat Community Acquired Pneumonia (CAP). Difficulties arise when deciding how to handle in-hospital mortality. Ad-hoc approaches that are commonly used to handle time to event outcomes with mortality can give disparate results and provide conflicting conclusions based on the same data. To ensure compatibility among studies investigating these outcomes, this type of data should be handled in a consistent and appropriate fashion.</p> <p>Methods</p> <p>Using both simulated data and data from the international Community Acquired Pneumonia Organization (CAPO) database, we evaluate two ad-hoc approaches for handling mortality when estimating the probability of hospital discharge and clinical stability: 1) restricting analysis to those patients who lived, and 2) assigning individuals who die the "worst" outcome (right-censoring them at the longest recorded LOS or TCS). Estimated probability distributions based on these approaches are compared with right-censoring the individuals who died at time of death (the complement of the Kaplan-Meier (KM) estimator), and treating death as a competing risk (the cumulative incidence estimator). Tests for differences in probability distributions based on the four methods are also contrasted.</p> <p>Results</p> <p>The two ad-hoc approaches give different estimates of the probability of discharge and clinical stability. Analysis restricted to patients who survived is conceptually problematic, as estimation is conditioned on events that happen <it>at a future time</it>. Estimation based on assigning those patients who died the worst outcome (longest LOS and TCS) coincides with the complement of the KM estimator based on the subdistribution hazard, which has been previously shown to be equivalent to the cumulative incidence estimator. However, in either case the time to in-hospital mortality is ignored, preventing simultaneous assessment of patient mortality in addition to LOS and/or TCS. The power to detect differences in underlying hazards of discharge between patient populations differs for test statistics based on the four approaches, and depends on the underlying hazard ratio of mortality between the patient groups.</p> <p>Conclusions</p> <p>Treating death as a competing risk gives estimators which address the clinical questions of interest, and allows for simultaneous modelling of both in-hospital mortality and TCS / LOS. This article advocates treating mortality as a competing risk when investigating other time related outcomes.</p

    A Robust Statistical Method for Association-Based eQTL Analysis

    Get PDF
    Background: It has been well established that theoretical kernel for recently surging genome-wide association study (GWAS) is statistical inference of linkage disequilibrium (LD) between a tested genetic marker and a putative locus affecting a disease trait. However, LD analysis is vulnerable to several confounding factors of which population stratification is the most prominent. Whilst many methods have been proposed to correct for the influence either through predicting the structure parameters or correcting inflation in the test statistic due to the stratification, these may not be feasible or may impose further statistical problems in practical implementation. Methodology: We propose here a novel statistical method to control spurious LD in GWAS from population structure by incorporating a control marker into testing for significance of genetic association of a polymorphic marker with phenotypic variation of a complex trait. The method avoids the need of structure prediction which may be infeasible or inadequate in practice and accounts properly for a varying effect of population stratification on different regions of the genome under study. Utility and statistical properties of the new method were tested through an intensive computer simulation study and an association-based genome-wide mapping of expression quantitative trait loci in genetically divergent human populations. Results/Conclusions: The analyses show that the new method confers an improved statistical power for detecting genuin
    corecore